Write out job metadata to a file in the same directory as refined.cif by marcuscollins · Pull Request #132 · diff-use/sampleworks

marcuscollins · 2026-03-05T06:03:25Z

Behold, the world's simplest PR. Just write out some metadata so it will be easier to process the results of our occupancy sweeps.

This starts to address #121

Summary by CodeRabbit

New Features
- Guidance jobs now write a job_metadata.json file into each job's output directory after completion.
- Output directories are now created automatically as needed before jobs run.
Bug Fixes
- Improved reliability of per-job metadata persistence and isolation so results and metadata are consistently saved.

coderabbitai · 2026-03-05T06:03:41Z

📝 Walkthrough

Walkthrough

Added an import for json, ensure the guidance output directory is created when running, and persist per-job metadata by writing a job_metadata.json file (serialized from the job object) into each job's output directory after executing the job. Removed an inactive commented block about wrapper reuse.

Changes

Cohort / File(s)	Summary
Guidance script updates `src/sampleworks/utils/guidance_script_utils.py`	Added `import json`; ensure `args.output_dir` exists in `run_guidance`; after each job in `run_guidance_job_queue`, serialize the job object and write `job_metadata.json` to the job's output directory; removed commented-out wrapper-reuse block.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

🐇 I hopped through code and left a mark,
A little JSON glowing in the dark,
Each job recorded, tidy and neat,
A rabbit's whisper in bytes so sweet,
Hop on—our runs are safe and stark. 🌙

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: writing job metadata to a file in the output directory alongside refined results.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch mdc-write-metadata

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/sampleworks/utils/guidance_script_utils.py (1)
584-584: Remove the redundant inline comment.

Line 584 restates what the code already makes clear; dropping it keeps this file closer to the repo’s “direct, readable” style.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/sampleworks/utils/guidance_script_utils.py` at line 584, Remove the
redundant inline comment that repeats the code’s action ("# write out the job
parameters to a JSON file in the same directory as the refined.cif file");
delete that comment line near the code that writes job parameters to JSON (the
block referencing refined.cif/job parameters) so the implementation (in
guidance_script_utils.py) remains direct and uncluttered.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/sampleworks/utils/guidance_script_utils.py`:
- Around line 585-586: Ensure the metadata write is resilient by creating the
parent directory for job_result.output_dir
(Path(job_result.output_dir).mkdir(parents=True, exist_ok=True)) before opening
"job_metadata.json", and protect the json.dump with a try/except so one bad job
doesn't abort the queue: build a serializable copy of job.__dict__ (e.g.,
convert GuidanceConfig.model and GuidanceConfig.guidance_type to simple
serializable values like their class name or str(repr(...))), use json.dump on
that safe dict (or json.dump(..., default=str) to fallback for non-serializable
objects), and on any exception log the error and continue rather than
re-raising; these changes should be applied where run_guidance() writes job
metadata so the queue keeps processing even if serialization or missing
directories occur.

---

Nitpick comments:
In `@src/sampleworks/utils/guidance_script_utils.py`:
- Line 584: Remove the redundant inline comment that repeats the code’s action
("# write out the job parameters to a JSON file in the same directory as the
refined.cif file"); delete that comment line near the code that writes job
parameters to JSON (the block referencing refined.cif/job parameters) so the
implementation (in guidance_script_utils.py) remains direct and uncluttered.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2f0036a2-9da3-4d26-8cf3-470d5cd6e216

📥 Commits

Reviewing files that changed from the base of the PR and between 9ef3393 and f8732e9.

📒 Files selected for processing (1)

src/sampleworks/utils/guidance_script_utils.py

coderabbitai

♻️ Duplicate comments (1)

src/sampleworks/utils/guidance_script_utils.py (1)

584-587: ⚠️ Potential issue | 🟠 Major

Previously flagged issues remain unresolved; additionally, consider including execution metadata.

The directory creation and serialization concerns from the previous review are still valid and unaddressed:

output_dir may not exist if _run_guidance() fails before save_everything() runs
json.dump(job.__dict__, ...) will raise TypeError for non-serializable fields (e.g., GuidanceType, StructurePredictor objects)
Any exception aborts the entire job queue

New observation: Writing only job.__dict__ omits valuable execution results. Per the JobResult dataclass, fields like status, exit_code, runtime_seconds, started_at, and finished_at are only in job_result, not in job. Consider merging both for complete metadata.

Proposed fix incorporating both concerns

-        # write out the job parameters to a JSON file in the same directory as the refined.cif file
-        with open(Path(job_result.output_dir) / "job_metadata.json", "w") as fp:
-            json.dump(job.__dict__, fp)
+        # write out the job parameters and execution results to a JSON file
+        metadata_path = Path(job_result.output_dir) / "job_metadata.json"
+        try:
+            metadata_path.parent.mkdir(parents=True, exist_ok=True)
+            combined_metadata = {
+                "config": job.__dict__,
+                "result": job_result.__dict__,
+            }
+            with metadata_path.open("w", encoding="utf-8") as fp:
+                json.dump(combined_metadata, fp, indent=2, default=str)
+        except (OSError, TypeError) as exc:
+            logger.warning(f"Failed to write job metadata to {metadata_path}: {exc}")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/sampleworks/utils/guidance_script_utils.py` around lines 584 - 587,
Ensure the job metadata write is robust: before opening
Path(job_result.output_dir) / "job_metadata.json" ensure the directory exists
(create parents with exist_ok=True) and perform the write atomically to avoid
partial files; merge serializable representations of job and job_result (use
job.__dict__ plus JobResult fields like status, exit_code, runtime_seconds,
started_at, finished_at) into a single dict; convert or filter non-serializable
values from GuidanceType/StructurePredictor (e.g., replace with their names/IDs
or call a to_dict() if available) or provide a default JSON serializer to avoid
TypeError when json.dump is called; wrap the file write in a try/except that
logs the exception (reference process/logger used elsewhere) instead of raising
so a single metadata write failure does not abort the entire job queue (locate
code around _run_guidance(), save_everything(), job_result.output_dir, and the
JobResult dataclass to implement these changes).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/sampleworks/utils/guidance_script_utils.py`:
- Around line 584-587: Ensure the job metadata write is robust: before opening
Path(job_result.output_dir) / "job_metadata.json" ensure the directory exists
(create parents with exist_ok=True) and perform the write atomically to avoid
partial files; merge serializable representations of job and job_result (use
job.__dict__ plus JobResult fields like status, exit_code, runtime_seconds,
started_at, finished_at) into a single dict; convert or filter non-serializable
values from GuidanceType/StructurePredictor (e.g., replace with their names/IDs
or call a to_dict() if available) or provide a default JSON serializer to avoid
TypeError when json.dump is called; wrap the file write in a try/except that
logs the exception (reference process/logger used elsewhere) instead of raising
so a single metadata write failure does not abort the entire job queue (locate
code around _run_guidance(), save_everything(), job_result.output_dir, and the
JobResult dataclass to implement these changes).

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a49d25b9-96c1-4beb-a640-e61b36e13275

📥 Commits

Reviewing files that changed from the base of the PR and between f8732e9 and 3239abc.

📒 Files selected for processing (1)

src/sampleworks/utils/guidance_script_utils.py

coderabbitai

♻️ Duplicate comments (1)

src/sampleworks/utils/guidance_script_utils.py (1)

587-590: ⚠️ Potential issue | 🟠 Major

Prevent metadata-write I/O failures from aborting the remaining job queue.

If this write hits an OSError (e.g., permission/disk issues), the loop exits and skips remaining jobs. Since metadata is auxiliary, handle write failures and continue.

Proposed fix

-        # write out the job parameters to a JSON file in the same directory as the refined.cif file
-        with open(Path(job_result.output_dir) / "job_metadata.json", "w") as fp:
-            json.dump(job.__dict__, fp)
+        # write out the job parameters to a JSON file in the same directory as the refined.cif file
+        metadata_path = Path(job_result.output_dir) / "job_metadata.json"
+        metadata_path.parent.mkdir(parents=True, exist_ok=True)
+        try:
+            with metadata_path.open("w", encoding="utf-8") as fp:
+                json.dump(job.__dict__, fp)
+        except OSError as exc:
+            logger.warning(f"Failed to write job metadata to {metadata_path}: {exc}")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/sampleworks/utils/guidance_script_utils.py` around lines 587 - 590, The
metadata write (opening Path(job_result.output_dir) / "job_metadata.json" and
json.dump(job.__dict__, fp)) can raise OSError and currently aborts the job
loop; wrap the open/json.dump in a try/except that catches OSError (or OSError
and IOError for compatibility), log the error with context including
job_result.output_dir and job identifier (e.g., job.id or other distinguishing
field) using the module's logger, and continue without re-raising so remaining
jobs are processed. Ensure the except only swallows I/O-related exceptions and
does not hide other failures.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/sampleworks/utils/guidance_script_utils.py`:
- Around line 587-590: The metadata write (opening Path(job_result.output_dir) /
"job_metadata.json" and json.dump(job.__dict__, fp)) can raise OSError and
currently aborts the job loop; wrap the open/json.dump in a try/except that
catches OSError (or OSError and IOError for compatibility), log the error with
context including job_result.output_dir and job identifier (e.g., job.id or
other distinguishing field) using the module's logger, and continue without
re-raising so remaining jobs are processed. Ensure the except only swallows
I/O-related exceptions and does not hide other failures.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fc7d3025-c3e5-4c90-a988-b1b9c9740d04

📥 Commits

Reviewing files that changed from the base of the PR and between 3239abc and d60f807.

📒 Files selected for processing (1)

src/sampleworks/utils/guidance_script_utils.py

k-chrispens

looks good!

marcuscollins requested a review from k-chrispens March 5, 2026 06:03

Write out job metadata to a file in the same directory as refined.cif

3239abc

marcuscollins force-pushed the mdc-write-metadata branch from f8732e9 to 3239abc Compare March 5, 2026 06:06

coderabbitai Bot reviewed Mar 5, 2026

View reviewed changes

Comment thread src/sampleworks/utils/guidance_script_utils.py

marcuscollins mentioned this pull request Mar 5, 2026

Create a general metadata solution for outputs #121

Open

coderabbitai Bot reviewed Mar 5, 2026

View reviewed changes

Make sure output directory exists

d60f807

coderabbitai Bot reviewed Mar 5, 2026

View reviewed changes

k-chrispens approved these changes Mar 5, 2026

View reviewed changes

k-chrispens merged commit b9005c9 into main Mar 5, 2026
1 check passed

k-chrispens deleted the mdc-write-metadata branch March 5, 2026 19:21

coderabbitai Bot mentioned this pull request Apr 10, 2026

feat(ciffiles): store metadata about ensemble generation directly in output CIF files. #209

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write out job metadata to a file in the same directory as refined.cif#132

Write out job metadata to a file in the same directory as refined.cif#132
k-chrispens merged 2 commits intomainfrom
mdc-write-metadata

marcuscollins commented Mar 5, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Mar 5, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

k-chrispens left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

marcuscollins commented Mar 5, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

k-chrispens left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

marcuscollins commented Mar 5, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 5, 2026 •

edited

Loading